South Bačka District
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Europe > Serbia > Vojvodina > South Bačka District > Novi Sad (0.40)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Toward Scalable and Valid Conditional Independence Testing with Spectral Representations
Frohlich, Alek, Kostic, Vladimir, Lounici, Karim, Perazzo, Daniel, Pontil, Massimiliano
Conditional independence (CI) is central to causal inference, feature selection, and graphical modeling, yet it is untestable in many settings without additional assumptions. Existing CI tests often rely on restrictive structural conditions, limiting their validity on real-world data. Kernel methods using the partial covariance operator offer a more principled approach but suffer from limited adaptivity, slow convergence, and poor scalability. In this work, we explore whether representation learning can help address these limitations. Specifically, we focus on representations derived from the singular value decomposition of the partial covariance operator and use them to construct a simple test statistic, reminiscent of the Hilbert-Schmidt Independence Criterion (HSIC). We also introduce a practical bi-level contrastive algorithm to learn these representations. Our theory links representation learning error to test performance and establishes asymptotic validity and power guarantees. Preliminary experiments suggest that this approach offers a practical and statistically grounded path toward scalable CI testing, bridging kernel-based theory with modern representation learning.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- (4 more...)
Outcome-Aware Spectral Feature Learning for Instrumental Variable Regression
Meunier, Dimitri, Wornbard, Jakub, Kostic, Vladimir R., Moulin, Antoine, Fröhlich, Alek, Lounici, Karim, Pontil, Massimiliano, Gretton, Arthur
We address the problem of causal effect estimation in the presence of hidden confounders using nonparametric instrumental variable (IV) regression. An established approach is to use estimators based on learned spectral features, that is, features spanning the top singular subspaces of the operator linking treatments to instruments. While powerful, such features are agnostic to the outcome variable. Consequently, the method can fail when the true causal function is poorly represented by these dominant singular functions. To mitigate, we introduce Augmented Spectral Feature Learning, a framework that makes the feature learning process outcome-aware. Our method learns features by minimizing a novel contrastive loss derived from an augmented operator that incorporates information from the outcome. By learning these task-specific features, our approach remains effective even under spectral misalignment. We provide a theoretical analysis of this framework and validate our approach on challenging benchmarks.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Serbia > Vojvodina > South Bačka District > Novi Sad (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
AutoSAGE: Input-Aware CUDA Scheduling for Sparse GNN Aggregation (SpMM/SDDMM) and CSR Attention
Sparse GNN aggregations (CSR SpMM/SDDMM) vary widely in performance with degree skew, feature width, and GPU micro-architecture. We present AutoSAGE, an input-aware CUDA scheduler that chooses tiling and mapping per input using a lightweight estimate refined by on-device micro-probes, with a guardrail that safely falls back to vendor kernels and a persistent cache for deterministic replay. AutoSAGE covers SpMM and SDDMM and composes into a CSR attention pipeline (SDDMM -> row-softmax -> SpMM). On Reddit and OGBN-Products, it matches vendor baselines at bandwidth-bound feature widths and finds gains at small widths; on synthetic sparsity and skew stress tests it achieves up to 4.7x kernel-level speedups. We release CUDA sources, Python bindings, a reproducible harness, and replayable cache logs.
- Europe > Serbia > Vojvodina > South Bačka District > Novi Sad (0.40)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > France > Île-de-France > Hauts-de-Seine > Nanterre (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
- Information Technology > Mathematics of Computing (0.87)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- Europe > Serbia > Vojvodina > South Bačka District > Novi Sad (0.04)
- Europe > Netherlands (0.04)
- (2 more...)
- Energy (0.46)
- Information Technology (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Communications (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.92)
- Overview (0.67)
- Health & Medicine (0.67)
- Education (0.46)
- Information Technology (0.45)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.61)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.61)
Edge-aware baselines for ogbn-proteins in PyTorch Geometric: species-wise normalization, post-hoc calibration, and cost-accuracy trade-offs
Stanković, Aleksandar, Lisica, Dejan
We present reproducible, edge-aware baselines for ogbn-proteins in PyTorch Geometric (PyG). We study two system choices that dominate practice: (i) how 8-dimensional edge evidence is aggregated into node inputs, and (ii) how edges are used inside message passing. Our strongest baseline is GraphSAGE with sum-based edge-to-node features. We compare LayerNorm (LN), BatchNorm (BN), and a species-aware Conditional LayerNorm (CLN), and report compute cost (time, VRAM, parameters) together with accuracy (ROC-AUC) and decision quality. In our primary experimental setup (hidden size 512, 3 layers, 3 seeds), sum consistently beats mean and max; BN attains the best AUC, while CLN matches the AUC frontier with better thresholded F1. Finally, post-hoc per-label temperature scaling plus per-label thresholds substantially improves micro-F1 and expected calibration error (ECE) with negligible AUC change, and light label-correlation smoothing yields small additional gains. We release standardized artifacts and scripts used for all of the runs presented in the paper.